Instructions

Below you will find several empty R code scripts and few places where a line starts with the word “Answer:”. Your task is to fill in the required code and answer the questions as stated.

Eggs Dataset

Today you will be working with a datasets of birds:

Here is a full data dictionary describing all of the variables

Notice that the last two variables are integer codes. They are stored as numbers but correspond to a category.

Starting plot

Create a scatter plot showing the mass of a male bird (x-axis) and the mass of an egg:

You should notice that the plot’s scale makes it hard to see the relationship between the two variables.

Changing the scale

Now add the layers scale_x_log10 and scale_y_log10

How would you now describe the relationship between the two variables (I just need one sentence here)?

Answer: the average mass of an egg increases when the mass of an adult bird increases.

Parrots

Create a new dataset called parrots consisting of just those birds that are parrots (hint: use the type variable; double hint: look at the raw data for exactly how to format the filter query):

## # A tibble: 12 x 10
##    genus    species  name   type  egg_mass male_mass mating_system display
##    <chr>    <chr>    <chr>  <chr>    <dbl>     <dbl>         <int>   <int>
##  1 Aprosmi~ erythro~ Red-w~ Parr~    11.5      135               2       3
##  2 Lathamus discolor Swift  Parr~     5.95      64.7             2       3
##  3 Neophema chrysos~ Blue-~ Parr~     4.20      45.7             2       1
##  4 Neophema petroph~ Rock   Parr~     4.85      53.0             2       1
##  5 Neophema pulchel~ Turqu~ Parr~     3.90      42.7             2       1
##  6 Neopsep~ bourkii  Bourk~ Parr~     3.75      46.0             2       1
##  7 Pezopor~ wallicus Ground Parr~     6.85      78.0             2       1
##  8 Polytel~ alexand~ Alexa~ Parr~     7.75      96.0             3       3
##  9 Polytel~ anthope~ Regent Parr~     9.40     175               2       3
## 10 Polytel~ swainso~ Superb Parr~     8.10     153               2       2
## 11 Psephot~ haemato~ Red-r~ Parr~     4.50      61.4             2       1
## 12 Purpure~ spurius  Red-c~ Parr~     7.15     117               2       1
## # ... with 2 more variables: resource <int>, clutch_size <dbl>

Now add a layer to the previous plot (keeping the log scales) where the parrots are highlighted in the color “red”. To make them stand out, make the base layer have an alpha value of 0.15. Finally, add a text annotation describing to the reader that the red points are parrots.

Smoothing line

Now, we are going to add a best-fit line to the plot. We do this by adding geom_smooth(method = "lm") to the plot. Add this to the plot using the log-log scale, but without highlighting the parrots.

I think the best-fit is a bit to colorful and noisy. Fix it by changing the line to this instead: geom_smooth(method = "lm", color = "black", se = FALSE, linetype = "dashed", size = 0.5).

Does the best-fit match the visual pattern you saw between the size of a bird and the size of its eggs (again, one sentence is sufficent)?

Answer:yes, it presents a positive relationship between the mass of an egg and the mass of an adult bird.

Outliers

If you look at the plot, you’ll see one bird in particular who has a very large egg size given the mass of the bird itself. This is the the Red-tailed tropicbird (also, you can add pictures to Rmarkdown!):

The tropicbird as a male mass of 218.7g and an egg mass of 87.00g. Annotate this point on the graph and give a label for it:

Your turn

Construct one final graph of the data. You are free to use the other variables that we did not look at yet or to look at different classes of birds. For this graph (only), please add an appropriate title and annotations.

## # A tibble: 3 x 10
##   genus   species   name   type   egg_mass male_mass mating_system display
##   <chr>   <chr>     <chr>  <chr>     <dbl>     <dbl>         <int>   <int>
## 1 Apteno~ patagoni~ King   Pengu~      306     12800             2       1
## 2 Diomed~ epomopho~ Royal  Albat~      445      8840             2       3
## 3 Diomed~ exulans   Wande~ Albat~      484      9110             2       3
## # ... with 2 more variables: resource <int>, clutch_size <dbl>